8 research outputs found
DeepHuMS: Deep Human Motion Signature for 3D Skeletal Sequences
3D Human Motion Indexing and Retrieval is an interesting problem due to the
rise of several data-driven applications aimed at analyzing and/or re-utilizing
3D human skeletal data, such as data-driven animation, analysis of sports
bio-mechanics, human surveillance etc. Spatio-temporal articulations of humans,
noisy/missing data, different speeds of the same motion etc. make it
challenging and several of the existing state of the art methods use hand-craft
features along with optimization based or histogram based comparison in order
to perform retrieval. Further, they demonstrate it only for very small datasets
and few classes. We make a case for using a learned representation that should
recognize the motion as well as enforce a discriminative ranking. To that end,
we propose, a 3D human motion descriptor learned using a deep network. Our
learned embedding is generalizable and applicable to real-world data -
addressing the aforementioned challenges and further enables sub-motion
searching in its embedding space using another network. Our model exploits the
inter-class similarity using trajectory cues, and performs far superior in a
self-supervised setting. State of the art results on all these fronts is shown
on two large scale 3D human motion datasets - NTU RGB+D and HDM05.Comment: Under Review, Conferenc
Deformable pose traversal convolution for 3D action and gesture recognition
The representation of 3D pose plays a critical role for 3D action and gesture recognition. Rather than representing a 3D pose directly by its joint locations, in this paper, we propose a Deformable Pose Traversal Convolution Network that applies one-dimensional convolution to traverse the 3D pose for its representation. Instead of fixing the receptive field when performing traversal convolution, it optimizes the convolution kernel for each joint, by considering contextual joints with various weights. This deformable convolution better utilizes the contextual joints for action and gesture recognition and is more robust to noisy joints. Moreover, by feeding the learned pose feature to a LSTM, we perform end-to-end training that jointly optimizes 3D pose representation and temporal sequence recognition. Experiments on three benchmark datasets validate the competitive performance of our proposed method, as well as its efficiency and robustness to handle noisy joints of pose.NRF (Natl Research Foundation, S’pore)Accepted versio